Filtering High-Dimensional Methylation Marks With Extremely Small Sample Size: An Application to Gastric Cancer Data
نویسندگان
چکیده
DNA methylations in critical regions are highly involved cancer pathogenesis and drug response. However, to identify causal out of a large number potential polymorphic methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models not scalable so features; second, multiple-test overfitting become serious. To this end, method quickly filter candidate narrow down targets for downstream analyses urgently needed. BACkPAy pre-screening Bayesian approach detect biological meaningful patterns differential levels with small sample size. prioritizes potentially important biomarkers by the false discovery rate (FDR) approach. It filters non-informative (i.e., non-differential) flat pattern across experimental conditions. In work, we applied genome-wide dataset three tissue types each type contains gastric samples. We also LIMMA (Linear Models Microarray RNA-Seq Data) compare its results what achieved BACkPAy. Then, Cox proportional hazards regression were utilized visualize prognostics significant markers The Cancer Genome Atlas (TCGA) survival analysis. Using BACkPAy, identified eight patterns/groups probes from dataset. TCGA data, five prognostic genes predictive progression cancer) that contain some probes, whereas no was using Benjamin-Hochberg FDR LIMMA. showed importance analysis extremely size cancer. revealed RDH13, CLDN11, TMTC1, UCHL1, FOXP2 can serve as treatment promoter level these serum could have diagnostic functions patients.
منابع مشابه
Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملAn Efficient Dimensionality Reduction Approach for Small-sample Size and High-dimensional Data Modeling
As for massive multidimensional data are being generated in a wide range of emerging applications, this paper introduces two new methods of dimension reduction to conduct small-sample size and high-dimensional data processing and modeling. Through combining the support vector machine (SVM) and recursive feature elimination (RFE), SVM-RFE algorithm is proposed to select features, and further, ad...
متن کاملT3-Plot for Testing Spherical Symmetry for High-Dimensional Data with a Small Sample Size
High-dimensional data with a small sample size, such as microarray data and image data, are commonly encountered in some practical problems for which many variables have to be measured but it is too costly or time consuming to repeat the measurements for many times. Analysis of this kind of data poses a great challenge for statisticians. In this paper, we develop a new graphical method for test...
متن کاملShrinkage-based diagonal Hotelling's tests for high-dimensional small sample size data
High-throughput expression profiling techniques bring novel tools and also statistical challenges to genetic research. In addition to detecting differentially expressed genes, testing the significance of gene sets or pathway analysis has been recognized as an equally important problem. Owing to the ‘‘large p small n’’ paradigm, the traditional Hotelling’s T 2 test suffers from the singularity p...
متن کاملMulti-dimensional data construction method with its application to learning from small-sample-sets
Insufficient training data is one of the major problems in neural network learning, because it leads to poor learning performance. In order to enhance an intelligent learning process, it is necessary to exploit the features of the problem from the available information even with limited scale. Due to the shortcomings of the existing methods for data generation; and also in general, a problem is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Genetics
سال: 2021
ISSN: ['1664-8021']
DOI: https://doi.org/10.3389/fgene.2021.705708